Determining scaling factor for each accelerometer using the 6-O method

Script summary

Script purpose: This script determines calibration (scaling) values for each accelerometer.

Overview: X/Y/Z values are scaled according to the 6-orientation method described by Garde et al. 2022 (doi: 10.1111/2041-210X.13804).

Inputs: Summary of calibration times (i.e. when tags where in which position) and accelerometry data files recorded during calibration.

Outputs A .csv file with a scaling factor for each axis (x, y and z) for each tag ID (tag_6O_scaling_factors.csv).

Background

In theory, the raw values recorded by each accelerometer (tag) should be consistent. In practice, they differ between tags and between companies. So we need to calibrate and scale the values before further analysis.

Garde et al. 2022 (doi 10.1111/2041-210X.13804) outline a method for calibrating each accelerometry axis: the 6-O method, which involves checking the acceleration values of the tag in 6 different orientations. We collected calibration data according to this method in May 2022, before deploying the tags. All of the tags were turned on, attached firmly to a metal clipboard, and the board was held in each of the 6 orientations (on the desk or against the wall) for at least 10 seconds. We used a spirit level to confirm that the boards were level in each orientation. The tags could not all fit on a single board, so this was repeated for 4 boards (B1-B4).

Positions (and expected acceleration values for each axis) were as follows:

  • Position 1: tag upside down and horizontal (Z2, x=0, y=0, z=1)
  • Position 2: tag right side up and horizontal (Z1, x=0, y=0, z=-1)
  • Position 3: tag on its side, “head” on the right (Y2, x=0, y=1, z=0)
  • Position 4: rotated clockwise, head down (X2, x=1, y=0, z=0)
  • Position 5: rotated clockwise, head left (Y1, x=0, y=-1, z=0)
  • Position 6: rotated clockwise, head up (X1, x=-1, y=0, z=0)

The code below could be neater and more concise (e.g. see code associated with Aulsebrook, Jacques-Hamilton and Kempenaers 2024 Animal Behaviour) but it gets the job done.

Setting up workspace

## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:data.table':
## 
##     between, first, last
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:data.table':
## 
##     hour, isoweek, mday, minute, month, quarter, second, wday, week,
##     yday, year
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
input_dir <- "../data/acc-calibration-data/"
output_dir <- "../outputs/calibration-outputs/"
plot_output_dir <- paste0(output_dir,"clockdrift-plots/")

sr <- 25 # sampling rate
scaling_window <- 2 # duration of scaling window in seconds
plot_drift <- TRUE # plotting drift makes script slower; not essential for script outputs

source("./general-helper-functions.R") 

The tags record data in UTC time, but calibration times were recorded in local (AKDT time, UTC-8).

cal_times <- fread(glue("{input_dir}acc_nano_calibration_times.csv")) %>%
  mutate(tag_ID=tolower(as.character(tag_id)))

cal_times <- cal_times %>%
  rename(x1_start=pos6_start_time,
         x2_start=pos4_start_time,
         y1_start=pos5_start_time,
         y2_start=pos3_start_time,
         z1_start=pos2_start_time,
         z2_start=pos1_start_time)

cal_times <- cal_times %>%
  mutate(across(time_on:time_off, 
                ~ paste(date, .x))) %>%
  mutate(across(c(time_on,time_off), 
                ~ as.POSIXct(.x, format="%d/%m/%Y %H:%M", tz="US/Alaska")),
         across(z2_start:clock_cal_start_time, 
                ~ as.POSIXct(.x, format="%d/%m/%Y %H:%M:%S", tz="US/Alaska")))

cal_times <- cal_times %>%
  mutate(across(time_on:time_off, ~ dt_to_acc(.x)))
filenames <- as.data.frame(list.files(input_dir,
                                      pattern = "\\.csv$",
                                      recursive=TRUE)) %>%
  setNames("filename") %>%
  filter(filename!="acc_nano_calibration_times.csv")

filenames$tag_ID <- tolower(
  as.character(str_sub(filenames$filename,10,13))) 

cal_times <- full_join(cal_times,filenames,by='tag_ID') %>%
  mutate(filename=paste0(input_dir,filename))

Determine clock error (drift) associated with calibration recordings

Each orientation for calibration was held for 10 seconds. Theoretically, if we add 4 seconds to the calibration time, and use the next 2 seconds for the calibration, then we can be reasonably certain that the data is “clean” and represents the correct orientation.

However, if the accelerometer clock was out by more than 7 seconds, some further adjustment is required.

During each calibration recording, the tag was moved up and down 5 times at a recorded timepoint. This is used to check that the accelerometer clock was running on time. Here, I plot each of these clock calibration points, to determine how far the clock drifted from real time.

If the accelerometer clock is precisely on time, the movement should begin at 0 seconds since the calibration start time. I manually checked the saved plots to see how far off the accelerometer clock was. If it is out by more than 7 seconds, we need to adjust the calibration times further. Code below is not the neatest but does the job for what we need.

drift_secs <- c(-3,-2,-2,-2,-1,-1,-2,-1,-2,-3,
                -3,-2,-2,-2,-2,-2,-2,-2,-2,-1,
                -2,-4,-1,-2,-2,0,-2,-2,-1,-2,
                20,18,0,0,0,0,20,18,20,0,
                0,0,0,0,0,18,0,-1,20,20,
                20,0,0,18,18,20,-1,0,18,0,
                22,20,20,20,20,18,20,22,18,17,
                22,0,0,20,22,0,22,20,22,22,
                0,0,0,20,-1,22,22,0,20,22,
                6,8,6,6,10,7,3,4,9,5)

tag_ID <- c('0208','0213','0214','0215','0216','0221','0235','0246','0247','0248',
            '0249','0252','0254','0255','0257','0260','0261','0262','0266','0267',
            '020a','020c','022c','023a','023d','024b','025b','026a','026b','026c',
            '0217','0222','0223','0225','0228','0233','0237','0239','0240','0241',
            '0258','0263','0265','0268','0269','021a','021b','021e','021f','022a',
            '022b','022e','022f','023c','023f','024a','024c','025a','025d','026f',
            '0206','0211','0212','0218','0224','0226','0227','0229','0230','0231',
            '0234','0236','0244','0245','0250','0251','0253','0256','021c','021d',
            '022d','023b','023e','024d','024e','025c','025e','025f','026d','026e',
            '0210','0219','0220','0232','0238','0243','0259','0264','0270','020d')

tag_drifts <- data.frame(drift_secs=drift_secs,
                         tag_ID=tag_ID)

cal_times <- left_join(cal_times, tag_drifts, by='tag_ID')

rm(drift_secs, tag_ID)

Below, calibration times are all adjusted to account for buffer and drift. They can now be used as they are to extract scaling values.

get_cal_time <- function(position_start, buffer_secs, drift_secs) {
  result = position_start + buffer_secs + drift_secs
  return(result)
}

adj_cal_times <- cal_times %>%
  mutate(across(z2_start:x1_start, 
                ~ get_cal_time(.x, buffer_secs=4, drift_secs=drift_secs)))

For one tag (026B), there are 3 minutes of data missing (2022-06-08 03:17:29.526 to 2022-06-08 03:20:04.546). This time period falls during the Z2 position calibration, which started at 2022-06-08 03:17:30. So for this tag, we need to bump the calibration a little earlier (to 03:17:27, accounting for drift), and manually confirm that the device was stationary (and that the data can be used for calibration).

info_026b <- adj_cal_times %>%
  filter(tag_ID=="026b")

dat_026b <- fread(info_026b$filename, header=TRUE) %>%
  filter(time>=as.POSIXct("2022-06-08 03:17:27", format="%Y-%m-%d %H:%M:%S", tz="UTC") &
         time<=as.POSIXct("2022-06-08 03:17:29", format="%Y-%m-%d %H:%M:%S", tz="UTC"))

max(dat_026b$x/1000)-min(dat_026b$x/1000)
## [1] 0.012
max(dat_026b$y/1000)-min(dat_026b$y/1000)
## [1] 0.015
max(dat_026b$z/1000)-min(dat_026b$z/1000)
## [1] 0.023
info_0208 <- adj_cal_times %>%
  filter(tag_ID=="0208")

dat_0208 <- fread(info_0208$filename, header=TRUE) %>%
  filter(time>=info_0208$z2_start & time<=info_0208$z2_start+2)

max(dat_0208$x/1000)-min(dat_0208$x/1000)
## [1] 0.016
max(dat_0208$y/1000)-min(dat_0208$y/1000)
## [1] 0.019
max(dat_0208$z/1000)-min(dat_0208$z/1000)
## [1] 0.031

Range of values for each axis is similar for the uncertain tag than for a tag that we know was stationary. I am therefore comfortable using the values for this time window to calibrate 026b.

adj_cal_times[adj_cal_times$tag_ID=='026b']$z2_start <- as.POSIXct("2022-06-08 03:17:27",
                                                                   format="%Y-%m-%d %H:%M:%OS",
                                                                   tz="UTC")

Get median vectorial sum for each axis

Unlike the original paper, which uses the maximum acceleration for the calibration, we chose to use median acceleration, which was more robust to small accidental movements or other artefacts.

convert_raw_acc_to_g <- function(x) {x/1000}

calc_med_vec_sum <- function(x,y,z) {
  med_vec_sum = median(sqrt(x^2+y^2+z^2))
  return(med_vec_sum)
}

get_scaling_values <- function(filename, scale_start, window_length){
  # filename = complete path of data file
  # scale_start = posix datetime of the start of the calibration window
  # window_length = duration of calibration window in seconds
  if(is.na(scale_start)){
    return(as.numeric(NA))
  }
  dat <- fread(filename, header=TRUE)
  subbed_data <- dat %>%
    filter(time>=scale_start & time<= scale_start+window_length)
  subbed_data_g <- subbed_data %>%
    mutate(across(c(x,y,z), convert_raw_acc_to_g, .names="acc_{.col}"))
  summary_output <- subbed_data_g %>%
    summarise(across(c(acc_x,acc_y,acc_z), median, .names="median_{.col}"),
              med_vec_sum=calc_med_vec_sum(acc_x,acc_y,acc_z))
  return(summary_output)
}

summarise_scaling_values <- function(df) {
# apply to a df with 3 columns, in order: 
  # 1: filename
  # 2: cal start time (posix or "%Y-%m-%d %H:%M:%S"),
       # must begin with 2 character position ID e.g. x1_start
  # 3: tag id
  summary <- do.call(
    rbind, apply(df, 1, function(x)
      get_scaling_values(x[1],
                         as.POSIXct(x[2],
                                    format="%Y-%m-%d %H:%M:%S", 
                                    tz="UTC"),
                         scaling_window) %>%
        rename_with(~ paste0(substr(colnames(df[,2]),1,2), "_", .)) %>%
        mutate(tag_ID=x['tag_ID'])
    )
  )
  return(summary)
}
scaling_x1 <- summarise_scaling_values(adj_cal_times[,c('filename','x1_start','tag_ID')])
scaling_x2 <- summarise_scaling_values(adj_cal_times[,c('filename','x2_start','tag_ID')])
scaling_y1 <- summarise_scaling_values(adj_cal_times[,c('filename','y1_start','tag_ID')])
scaling_y2 <- summarise_scaling_values(adj_cal_times[,c('filename','y2_start','tag_ID')])
scaling_z1 <- summarise_scaling_values(adj_cal_times[,c('filename','z1_start','tag_ID')])
scaling_z2 <- summarise_scaling_values(adj_cal_times[,c('filename','z2_start','tag_ID')])
    
scaling_data <- full_join(scaling_x1, scaling_x2, by='tag_ID') %>%
    full_join(., scaling_y1, by='tag_ID') %>%
    full_join(., scaling_y2, by='tag_ID') %>%
    full_join(., scaling_z1, by='tag_ID') %>%
    full_join(., scaling_z2, by='tag_ID')
# The scaling factor is what values in this axis are multiplied by
scaling_values <- scaling_data %>%
  rowwise() %>%
  mutate(x_scaling_factor = 1 / mean(c(x1_med_vec_sum, x2_med_vec_sum)),
         y_scaling_factor = 1 / mean(c(y1_med_vec_sum, y2_med_vec_sum)),
         z_scaling_factor = 1 / mean(c(z1_med_vec_sum, z2_med_vec_sum))) %>%
  ungroup () %>%
  select(tag_ID, x_scaling_factor, y_scaling_factor, z_scaling_factor)

scaling_values %>%
  select(-tag_ID) %>%
  summarise_all(.funs = c(min="min",max="max"))
## # A tibble: 1 × 6
##   x_scaling_factor_min y_scaling_factor_min z_scaling_factor_min
##                  <dbl>                <dbl>                <dbl>
## 1                0.990                0.979                0.985
## # ℹ 3 more variables: x_scaling_factor_max <dbl>, y_scaling_factor_max <dbl>,
## #   z_scaling_factor_max <dbl>
cal_times_scaling <- full_join(adj_cal_times, scaling_values, by="tag_ID")

Re-calculate stationary values after scaling

apply_scaling <- function(dat, scaling_factor){
  result <- dat*scaling_factor
  return(result)
}

calc_scaled_medians <- function(filename, dat_start, window_length, 
                                x_factor, y_factor, z_factor){
  # filename = complete path of data file
  # dat_start = posix datetime of the start of the data window to scale
  # window_length = duration of data window in seconds
  if(is.na(dat_start)){
    return(as.numeric(NA))
  }
  dat <- fread(filename, header=TRUE)
  subbed_data <- dat %>%
    filter(time>=dat_start & time<= dat_start+window_length)
  subbed_data_g <- subbed_data %>%
    mutate(across(c(x,y,z), convert_raw_acc_to_g, .names="acc_{.col}"),
           scaled_acc_x = apply_scaling(acc_x, x_factor),
           scaled_acc_y = apply_scaling(acc_y, y_factor),
           scaled_acc_z = apply_scaling(acc_z, z_factor))
  summary_output <- subbed_data_g %>%
    summarise(across(c(acc_x,acc_y,acc_z), median, .names="median_{.col}"),
              across(c(scaled_acc_x, scaled_acc_y, scaled_acc_z), median, .names="median_{.col}"),
              med_vec_sum=calc_med_vec_sum(acc_x,acc_y,acc_z),
              scaled_med_vec_sum=calc_med_vec_sum(scaled_acc_x,scaled_acc_y,scaled_acc_z))
  return(summary_output)
}

summarise_scaled_values <- function(df) {
# apply to a df with 6 columns, in order: 
  # 1: filename
  # 2: window start time (posix or "%Y-%m-%d %H:%M:%S"),
       # must begin with 2 character position ID e.g. x1_start
  # 3: tag id
  # 4: x_scaling_factor
  # 5: y_scaling_factor
  # 6: z_scaling_factor
  summary <- do.call(
    rbind, apply(df, 1, function(x)
      calc_scaled_medians(x[1],
                         as.POSIXct(x[2],
                                    format="%Y-%m-%d %H:%M:%S", 
                                    tz="UTC"),
                         scaling_window,
                         as.numeric(x[4]),
                         as.numeric(x[5]),
                         as.numeric(x[6])) %>%
        rename_with(~ paste0(substr(colnames(df[,2]),1,2), "_", .)) %>%
        mutate(tag_ID=x['tag_ID'])
    )
  )
  return(summary)
}
scaled_x1 <- summarise_scaled_values(
  cal_times_scaling[,c('filename','x1_start','tag_ID',
                       'x_scaling_factor','y_scaling_factor','z_scaling_factor')])
scaled_x2 <- summarise_scaled_values(
  cal_times_scaling[,c('filename','x2_start','tag_ID',
                       'x_scaling_factor','y_scaling_factor','z_scaling_factor')])
scaled_y1 <- summarise_scaled_values(
  cal_times_scaling[,c('filename','y1_start','tag_ID',
                       'x_scaling_factor','y_scaling_factor','z_scaling_factor')])
scaled_y2 <- summarise_scaled_values(
  cal_times_scaling[,c('filename','y2_start','tag_ID',
                       'x_scaling_factor','y_scaling_factor','z_scaling_factor')])
scaled_z1 <- summarise_scaled_values(
  cal_times_scaling[,c('filename','z1_start','tag_ID',
                       'x_scaling_factor','y_scaling_factor','z_scaling_factor')])
scaled_z2 <- summarise_scaled_values(
  cal_times_scaling[,c('filename','z2_start','tag_ID',
                       'x_scaling_factor','y_scaling_factor','z_scaling_factor')])
scaled_data <- full_join(scaled_x1, scaled_x2, by='tag_ID') %>%
    full_join(., scaled_y1, by='tag_ID') %>%
    full_join(., scaled_y2, by='tag_ID') %>%
    full_join(., scaled_z1, by='tag_ID') %>%
    full_join(., scaled_z2, by='tag_ID')

scaled_data_long <- scaled_data %>%
  select(!tag_ID) %>%
  pivot_longer(cols=everything(), names_to="measure", values_to ="value") %>%
  mutate(orientation=substr(measure,1,2),
         measure=substr(measure,4,nchar(measure)))
ggplot(data = scaled_data_long %>% filter(measure=="med_vec_sum"|measure=="scaled_med_vec_sum"), 
                         aes(x=orientation, y=value,)) + 
  geom_boxplot(outlier.shape=NA, aes(colour=measure)) +
  #geom_jitter(alpha=0.5, width=0.2, aes(colour=measure))+
  theme_classic()+
  #facet_wrap(~measure)+
  labs(x="\n Tag position",y="Acceleration median vectorial sum\n")+
  scale_y_continuous(n.breaks=6)+
  theme(plot.margin = unit(c(0,0,0,0),"cm"),
        axis.title.x = element_text(size=12),
        axis.title.y = element_text(size=12),
        axis.text.x = element_text(size=10, colour="black"),
        axis.text.y = element_text(size=10, colour="black"),            
        axis.ticks.length = unit(0.1,"cm"))

In the plot, note that if the two positions for an axis were offset in different directions, then one axis will appear to improve (get closer to 1) after scaling, while the other will seem slightly worse after scaling. This is to be expected. Overall, though, when these are all averaged together, we should see some improvement.

scaled_data_plot <- scaled_data_long %>%
  filter(measure=="med_vec_sum"|measure=="scaled_med_vec_sum") %>%
  mutate(scaled = case_when(measure=="med_vec_sum" ~ "unscaled",
                            measure=="scaled_med_vec_sum" ~ "scaled"))

scaled_data_plot$scaled <- factor(scaled_data_plot$scaled, levels=c("unscaled","scaled"))

ggplot(data = scaled_data_plot, aes(x=scaled, y=value,)) + 
  geom_boxplot(outlier.shape=NA) +
  #geom_jitter(alpha=0.5, width=0.2)+
  theme_classic()+
  #facet_wrap(~measure)+
  labs(x="\n",y="Acceleration median vectorial sum\n")+
  scale_y_continuous(n.breaks=6)+
  theme(plot.margin = unit(c(0,0,0,0),"cm"),
        axis.title.x = element_text(size=12),
        axis.title.y = element_text(size=12),
        axis.text.x = element_text(size=10, colour="black"),
        axis.text.y = element_text(size=10, colour="black"),            
        axis.ticks.length = unit(0.1,"cm"))

From the plot of unscaled versus scaled acceleration, the accelerometers were already well-calibrated, but scaling very slightly reduces the variance.

Save scaling data

write.csv(scaling_values, 
          file=glue('{output_dir}tag_6O_scaling_factors.csv'),
          row.names = FALSE)

R Session Info

sessionInfo() %>% pander()

R version 4.3.2 (2023-10-31)

Platform: x86_64-pc-linux-gnu (64-bit)

locale: LC_CTYPE=en_US.UTF-8, LC_NUMERIC=C, LC_TIME=en_US.UTF-8, LC_COLLATE=en_US.UTF-8, LC_MONETARY=en_US.UTF-8, LC_MESSAGES=en_US.UTF-8, LC_PAPER=en_US.UTF-8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=en_US.UTF-8 and LC_IDENTIFICATION=C

attached base packages: stats, graphics, grDevices, utils, datasets, methods and base

other attached packages: glue(v.1.6.2), lubridate(v.1.9.2), stringr(v.1.5.1), tidyr(v.1.3.0), pander(v.0.6.5), ggplot2(v.3.4.3), dplyr(v.1.1.3) and data.table(v.1.14.8)

loaded via a namespace (and not attached): gtable(v.0.3.4), jsonlite(v.1.8.7), compiler(v.4.3.2), tidyselect(v.1.2.0), Rcpp(v.1.0.11), jquerylib(v.0.1.4), scales(v.1.2.1), yaml(v.2.3.7), fastmap(v.1.1.1), R6(v.2.5.1), labeling(v.0.4.3), generics(v.0.1.3), knitr(v.1.44), tibble(v.3.2.1), bookdown(v.0.35), munsell(v.0.5.0), bslib(v.0.6.0), pillar(v.1.9.0), rlang(v.1.1.2), utf8(v.1.2.4), stringi(v.1.8.2), cachem(v.1.0.8), xfun(v.0.41), sass(v.0.4.7), timechange(v.0.2.0), cli(v.3.6.1), withr(v.2.5.2), magrittr(v.2.0.3), rmdformats(v.1.0.4), digest(v.0.6.33), grid(v.4.3.2), rstudioapi(v.0.15.0), lifecycle(v.1.0.4), vctrs(v.0.6.4), evaluate(v.0.23), farver(v.2.1.1), fansi(v.1.0.5), colorspace(v.2.1-0), purrr(v.1.0.2), rmarkdown(v.2.25), tools(v.4.3.2), pkgconfig(v.2.0.3) and htmltools(v.0.5.7)